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Abstract 



Bayesian Inference is a powerful approach to data analysis that is based almost entirely on 
probability theory. In this approach, probabilities model uncertainty rather than randomness 
or variability. This thesis is composed of a series of papers that have been published in various 
astronomical journals during the years 2005-2008. The unifying thread running through the 
papers is the use of Bayesian Inference to solve underdetermined inverse problems in astrophysics. 
Firstly, a methodology is developed to solve a question in gravitational lens inversion - using the 
observed images of gravitational lens systems to reconstruct the undistorted source profile and 
the mass profile of the lensing galaxy. A similar technique is also applied to the task of inferring 
the number and frequency of modes of oscillation of a star from the time series observations 
that are used in the field of asteroseismology. For these complex problems, many of the required 
calculations cannot be done analytically, and so Markov Chain Monte Carlo algorithms have 
been used. Finally, probabilistic reasoning is applied to a controversial question in astrobiology: 
does the fact that life formed quite soon after the Earth constitute evidence that the formation 
of life is quite probable, given the right macroscopic conditions? 
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Chapter 1 

Introduction 

I refuse to answer that question on the grounds that I don't know the answer. 
- Douglas Adams 

1.1 Forward and Inverse Problems 

Science proceeds by cor nparing the p redictions of theories with observations or experiments 



designed to test them (iJefFreysl . Il96ll ). If a theory correctly predicts the outcome of some 



observations or an experiment, our level of confidence in the theory is increased, otherwise the 
theory becomes less plausiblq^. In recent years, the increase in computing power has allowed 
theorists to carry out increasingly realistic simulations of physical phenomena, predicting very 
specific details that are often beyond the reach of current observational and experimental limits. 
For instance, in the field of n-body simulations, it is now possible to answer the question "given 
certain initial conditions at some time ti, and some assumed laws of ph ysics, what will be the 
state of the system at a later time t2?" ( Hockney and Eastwoodl . 19881 ). However, it is much 



harder to answer the reverse question: given all of the observed data, what can now be said 
about the laws of physics that have been operating? 



This kind of scenario is often referred to as an inverse problem ([Aster. Borchers and Thurber 



20041 '! ■ It is often possible to solve a forward problem, reasoning from some physical assumptions 



to a prediction for what would be observed. But reasoning from an observation back to the 
correct model is difficult, and usually underdetermined in the sense that there are many possible 
explanations for a given data set. A typical observation rules out many theories but is consistent 
with many others. While many specific inverse problems may be solvable using techniques 
invented separately for each problem, there is a general framework for solving all problems of 
this type. This framework is introduced in the following section. 



1.2 What is Bayesian Inference? 

When probability theory is taught, it is usually introduced without a great deal of discussion as 
to the meaning or interpretation of the quantity "probability" . Most introductory applications 
are based around games of chance - coin flipping, dice rolling, and card games. In these cases, 
some intuitively obvious assumptions are made, such as the assumption that each of the six 
sides of a die will appear with probability 1/6 and each toss is independent. The probabilities of 
more complex events can then be calculated by applying the mathematical rules of probability 
theory. These rules will now be stated. For any two propositions or events A and B, the product 
rule states 

P{A,B) = P{A)P{B\A) (1.1) 

^ Often, the theory that is being penahsed is not some important or fundamental theory, but the hypothesis 
that the experiment is doing what it was thought to be doing. In less words, the experiment could be wrong. 
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where P{A,B) is the probabihty that both A and B are true, and P{B\A) is the probabihty 
that B is true, given that A is true. The sum rule relates P{A) to P{A), the probabihty that A 
is false. 

P{A) + P{A) = l (1.2) 
As usual, all probabilities are real numbers in the interval [0, 1]. 

The mathematical content of probability theory consists of the above rules and their conse- 
quences. However, for the scientist or applied mathematician, the question of the interpretation 
of probability is of considerable importance, because it determines the range of applicability of 
the above equations. 



1.2.1 Subjective Probabilities 

It is not the purpose of this thesis to repeat the l ong history of the controversies over the meaning 
of probability; the interested reader is referred to Jaynes ( 20031 ) and references therein. Although 



Jaynes was a partisan, his career took place in the mid to late 20th century, at a time when the 
Bayesian view was much less mainstream than it is today. 

An important derivation of the rules of probability was provided by[c3 (Il96ll . Il94fil ). He 



was 



seeking a generalisation of standard Boolean logic to take uncertainty into account. Applying 
a few basic consistency criteria implies that, if the degree of plausibility of a hypothesis H is 
modelled by a real number, then the consistency criteria are only met if the rules for combining 
plausibilities are equivalent to rules of probability theory - in other words, the laws of probability 
are the unique rules for reasoning in the presence of uncertaint}{l. Thus, probability theory can 
be used as a mathematical model of our state of knowledge about the plausibility of various 
hypotheses. Any hypothesis with a probability of 1 is certain to be true, those with probability 
are certain to be false, and any number between these describes some intermediate level of 
certainty. 

Of course, the degree of plausibility of any proposition depends on the information that is taken 
into account and the assumptions that are being made. Hence, Bayesian probabilities are always 
conditional probabilities. In general, the probability of a hypothesis H given information or 
assumptions I, denoted P{H\I), is different to the probability of H given different information J, 
denoted P{H\J). It is certainly the case that different people can disagree about the probability 
of the same hypothesis - but only if they have different information, or are making different 
starting assumptions. It is in this sense that probabilities are subjective - but ideally they 
should be assigned based on logical reasoning and all of the available evidence. Occasionally, 
the nature of the evidence can be complicated and qualitative, and assumptions like "that seems 
implausible, so we will assign probability 1/100" will have to be made at the beginning of the 
calculation. If the outcome depends on any ad-hoc judgments such as these, then the answer to 
the problem is that it depends on what you assume, which is a situation not unique to Bayesian 
probability theory. 

Consider two propositions H and D. Expanding the joint probability of H and D in the two 
ways allowed by Equation 11.11 gives 

P{H, D\I) = P{H\I)P{D\H, I) = P{D\I)P{H\D, I) (1.3) 

where everything is conditional on the background information and/or assumptions /. Whilst / 
can be omitted from equations to increase readability, it is always present, and neglecting this 
can lead to confusion when comparing two calculations that appear equivalent, but implicitly 



^The rules of probability can also be justified from requ irements on rational betting behaviour, with no reference 
to repititions, frequencies or averages jCaves et al.l , l2002l ). 
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depend on different assumptions. Rearranging, we obtain Bayes' Theorem 



If if is a hypothesis about nature, and D is a statement about some observed data, this equation 
describes how to calculate our plausibility for H in the light of the extra data D — it is equal 
to the probability of H before taking into account the data, times P{D\H, I), a measure of how 
well H would have predicted D to occur, and divided by P{D\I), the chance that this data 
would be observed whether the hypothesis was true or false. 



1.3 Using Bayes' Theorem for Estimation 

It is straightforward to generalise Equation 11.41 to the case where there are a large number 
of competing mutually exclusive hypotheses. In practise, the most common situation is that 
we are interested in learning the value of some quantity or set of quantities 9. Then the hy- 
potheses we wish to test (and calculate the probabilities of, given some data) are of the form 
{9 = 0.300,6* = 0.301,6* = 0.302, ...} (for discrete variables; continuous variables are described 
by density functions where the probability of the true value being in any finite interval is the in- 
tegral of the density over that interval). In this case, a version of Bayes' theorem for probability 
distributions can be used. 

To estimate unknown quantities (denoted collectively by 9) from measurements of other quanti- 
ties D, we model our state of knowledge by a joint distribution over the space of possible values 
for 9 and D. Then we condition on the values of D that were actually observed, to calculate the 

posterior distribution for 9: 

p{9\DJ)^p{9\I)p{D\9,I) (1.5) 

The p's in this equation can be either discrete probability distributions or continuous probability 
density functions, depending on the possible values that the 9 and D variables can take. Usually, 
rather than choosing a model distribution for p{9\D,I) directly, it is much easier to assign the 
two distributions on the right side of Equation II. 5t p{9\I) is called the prior distribution and 
describes our state of knowledge about 9 before taking into account the current data, and 
p{D\9,I), which models our predictions about the data given the parameters of interest. When 
the actual data D are known and fixed, p{D\9, 1) becomes a function of 9 only and is called the 
likelihood function. 

If 9 is made up of some parameters of interest, x, and some uninteresting parameters, y, a 
posterior probability distribution for x alone can be calculated by the process of marginalisa- 
tion. Integrating a joint probability distribution over one of the variables yields the marginal 
distribution for the other variables: 

p{x\D,I) (X J p{x,y\D,I)dy (1-6) 

where the integration is over all possible values of y; Figure 11.11 is a graphical representation 
of the process of marginalisation. The top left panel shows a joint probability density function 
p{x,y\I) for two variables, x and y. The result of integrating the joint density along the green 
line X = 2 becomes the value of x's marginal density p{x\I) at x = 2. From the right hand panels, 
it is apparent that if we can acquire a random sample of points in the (x, y) plane sampled from 
the joint distribution (here, = 100), simply ignoring the y values results in a sample from the 
marginal distribution for x; this is the motivation behind Markov Chain Monte Carlo algorithms, 
which are introduced in section [1.41 For more de tails on the basic philos ophy and practice o f 
Bayesian Inference, the introductory textbooks by lSivia and Skilling (|200fil ) and lGregoryl 



7 



Joint PDF Sample 




Figure 1.1: The top left panel shows a joint Gaussian probability density function p{x,y\I) for 
two variables, x and y, and the corresponding marginal density for x is shown below. If this 
was the posterior distribution for two parameters, a sample from this distribution, such as that 
shown in the right hand panels, would suffice to estimate quantities such as marginal estimates 
and error bars for x. The sample size in this case was 100, and increasing the sample size would 
make the sample a accurate approximation to the continuous densities in the left hand panels. 



are recommended. F or reade r s alre ady familiar with the basics, lO'Hagan and Forstei (I200A 
is a useful reference. Javned ( 2003) provides entertaining and thought provoking discussions 
of fundamental principles, and [M acKay (j2003l ) is full of interesting examples. Applications to 
cosmology are reviewed by iTrottai (,2008. ) . 



1.4 Markov Chain Monte Carlo 

The main principles of Bayesian Inference are simple and have been described in the previous 
section. The inputs are the choices of realistic models for the prior distribution and the sampling 
distribution/likelihood function, and the output of interest is a distribution that is proportional 
to their product. However, there are often significant challenges involved in carrying out the 
necessary calculations, particularly the calculation of marginal distributions using Equation 1 1.61 
In complicated problems, the parameter space (set of possible 9 values) is often high dimensional 
- in common applications it can range from a few variables to thousands. Integrating functions 
of many variables is difficult, and in this case the integrand is often sharply peaked, if the data 
constrain the parameters very well. 
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To address this challenge, Marko v Chain Monte Car lo (MCMC) techniques have been developed 



and are becoming very popular (iGilks et alj . Il995l ). The basic idea is to use random number 



generation in order to sample points from a distribution that is equal to the posterior distribu- 
tion. If each point in the parameter space can be thought of as a model, this means we use a 
computer to randomly generate a sample of models for the data, according to their probability 
distribution given the data and prior information. Then, properties of the sample can be used to 
assess the level of uncertainty remaining about 6, simply by considering measures of the diversity 
of the 0's that were returned by the algorithm. 

If 9 is multidimensional, such a sample can easily be used for marginalisatiorH - the marginal 
distribution for a parameter can be viewed simply by plotting a histogram of the variable of 
interest, ignoring the values of the others. 

The mathematical definition of a Markov Chain is a set of random variables {Xi, X2, X^, ...} 
with the special property that the joint probability distribution 

p{Xi,X2,...\I) =p{Xi\I)p{X2\Xi,I)p{Xs\X2,Xi,I)... (1.7) 

is such that the conditional distribution for Xj+i depends only on the value of Xi and not on 
the previous history of the chain. Then the joint distribution can be written as 

p{Xi,X2,...\I) =p{Xi\I)p{X2\Xi,I)p{Xs\X2,I)p{X^\X3,I)... (1.8) 

The idea behind MCMC is to generate an instance of a Markov Chain, starting at some point 
Xi in the parameter space, and generating subs e quent states from the probability distribution 
p{Xi^i\Xi, I), called the transition kernel ( Neal 19931 ). The transition kernel is specially con- 



structed such that as i increases, the marginal probability distributions for all of the {Xi} tends 
towards the distribution that we wish to sample (usually the posterior distribution for some 
parameters given data). Less formally, an MCMC algorithm explores the parameter space by a 
random walk, but spends more time in regions of higher probability, such that in the long run, the 
amount of time spent in any region is proportional to the amount of probability in that region. 
Due to the random walk aspect of MCMC, the generated sample of points is not independent, 
but for the purposes of calculating parameter estimates and error bars, this is not a major 
drawback. As long as the chain is simulated for a long enough time to have generated a few 
tens of essentially independent samples, this is usu ally good en ough. There are many different 



ways of constructing chains that have this property (|Neall . ll993l ). A sufficient, but not necessary, 
condition is that the chain is ergodic (meaning that any state can eventually be reached from 
any other), and satisfies the detailed balance condition: 

pjXj+i =y\Xi = x) ^ /(y) 

p{X,+i = x\X, = y) fix) ^ ■ > 

where / is proportional to the probability density we would like the chain to explore. 

1.4.1 The Metropolis-Hastings Algorithm 

The Metropolis-Hasting s method i s one of the simplest and most popular MCMC algorithms 
( Metropolis et al. . 19531 : Hastings! . 19701 ). Suppose that the prior density and likelihood as 



functions of are t^{9) and L{9) respectively, and these can be evaluated for any point 9 in the 
parameter space. If the current state of the simulation is 9i, the next state, is chosen as 
follows: A new value 9' is generated, chosen from a proposal distribution q[9'\9). Then the next 
state of the chain, is equal to 9' with probability a (called the acceptance probability); 



''Although MCMC and other posterior sampUng methods may seem slow at times, they are effectively calcu- 
lating integrals over very high dimensional spaces - a consoling thought. 
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otherwise, with probabihty 1 — a, the chain remains in the same state, so 9i+i = 6i. The 
acceptance probabihty a is given by 



a = mm 



q{e\e') 7rie')Lie' 



q{e'\e) 7r{e)L{e) 



(1.10) 



In words, this means that the current state is randomly perturbed (according to q{9'\9)), and 
accepted if the new state of the chain is at a region of higher posterior density vrL. If the posterior 
density is lower at the proposed point, it still has a chance of being accepted, that chance being 
equal to the ratio of the new density to the old. The extra factor involving q is there to ensure 
detailed balance if the proposal density is asymmetric. Commonly, it is symmetric, meaning 
that the chance of proposing 9' if the current state is 9 is the same as proposing the reverse 
move if the current state were 9'. It is straightforward to show that the Metropolis updating 
rule satisfies detailed balance (Equation ll.Op . and is therefore a valid MCMC algorithm. This 
means that long Markov chains simulated from the acceptance rule in Equation II . 101 will explore 
the parameter space with the fraction of time spent in any volume being proportional to the 
total amount of probability contained in that volume. 

In practice, we need to choose a proposal distribution. A common method is to add a small, 
normally distributed perturbation to the current state. If 9 is one dimensional, this would mean 
that q{9'\9) ~ N{9,a'^). The typical size of the perturbation, a, is chosen by the user, and 
affects the performance of the method. If a is too small, most of the proposed moves will be 
accepted, but the exploration of the parameter space will be via a slow random walk. If a is 
too large, most of the proposed moves will be rejected, and the chain will stay in the same state 
for a long time. A convenient but somewhat wasteful approach is to randomise the width of 
the proposal to a new value at every step. Generally, it is a good idea to aim for about 50% 
of the proposed moves to be accepted. In multidimensional parameter spaces, proposal changes 
can be made that only perturb the value of one parameter, or that perturb multiple parameters 
simultaneously. 

Of course, if the starting point for the chain, Xi, is far from the bulk of the probability distribu- 
tion, it may take a long time before samples are effectively being taken from that distribution. 
This depends on the nature of the target probability distribution and the Metropolis-Hastings 
proposal distribution chosen. Generally, if the distribution is not multimodal, an arbitrary start- 
ing point in a low probability region will eventually wander uphill due to the selective pressur^ 
exerted by the acceptance probability of Equation 11.101 This initial uphill climb of an MCMC 
simulation is referred to as the burn-in period, and 9 values sampled during this period are 
usually discarded. Detecting whether an MCMC chai n has converged t o the target distribu- 
tion is a difficult problem and there are no guarantees ( Gilks et al. . 19951 ). Visual inspection of 



the output is often good enough, and in the author's opinion, is more useful than any formal 
criterion. 

An example of the Metropolis-Hastings algorithm is presented in Figure 11.21 The target dis- 
tribution which the sample should come from was chosen to be a standard normal distribution 
(mean 0, standard deviation 1). After an initial burn-in period of about 100 iterations, the 
chain successfully samples from the target distribution. Although adjacent samples are not in- 
dependent, this simulation contains about as much information about the target density as 100 
independent samples would. Amusingly, the presence of rejections in the Metropolis-Hastings 
algorithm can cause the output to resemble a city skyline; this is not the origin of the term 
Metropolis algorithn^. If necessary, MC MC algorithms that are more efficient than straightfor- 
ward Metropolis-Hastings are available ( Murray . 200?! ). 



*In this respect, MCMC simulations resemble evolution by natural se lection dPawkind. 1986h . and can also be 



used to construct optimisation algorithms, such as simulated annealing (iKirkpatrick et at 
^Apparently, Geraint Lewis thought this was the case for some time 



19831 ) 
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Throughout this thesis, most of the MCMC uses the Metropoh s-Hastings algorithm as de- 



scribed abov e, and an extension called Reversible Jump MCMC (jO'Hagan and Forsted . 12004 



Greenl . Il995l ). Reversible jump is used when there is uncertainty, not just about the value of 
the parameters, but also about the number of parameters that should be in the model. For 
example, in Chapter 7, there is uncertainty not just in the frequencies of some stellar oscillation 
modes, but also how many modes there are in the first place. This is achieved by including 
proposal transitions that add an extra component to the model or remove a component from 
the model (i.e. increase or decrease the dimensionality of the parameter vector, respectively). 
The adding and removing proposals are chosen in order to satisfy detailed balance with respect 
to the prior distribution, and are then accepted or rejected using the likelihood ratio only. For 
example, when adding an extra component to the model, the extra component is generated from 
its prior distribution (given the other components). When removing a component, a particu- 
lar component is chosen at random. This is how the MCMC methods in Chapters 4-7 were 
implemented. 
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Figure 1.2: An example of an MCMC run applied to sampling from a standard normal distri- 
bution. The burn-in period is short in this case, about 100 steps, because the target density is 
unimodal and the local gradient always influences the chain to move towards the mode. This 
helpful property is common but not universal. 



The primary application of MCMC methods considered in this thesis is to an inverse problem 
in the study of gravitational lenses. When a massive object lies along the line of sight from 
the observer to a distant source in the universe, the gravitational field of the massive object 
bends the light rays coming from the distant source. As a result, the observed image does 
not accurately reflect the actual morphology of the background source. In addition, the image 
that we see is degraded by blurring caused by atmospheric effects (for example, scintillation, 
where random fluctuations in the conditions in the Earth's atmosphere cause background point 
sources to "twinkle", and extended sources to blur when viewed over timescales greater than 
fractions of a second) and instrumental effects such as the diffraction of the received light as it 
enters the telescope. Additionally, the telescope electronics and the fact that we can sometimes 
observe only small numbers of photons are some common causes of additional random noise in 
the image. The challenge is to take a blurred and noisy observed image and to simultaneously 
infer the original undistorted source profile and the mass profile of the intervening "gravitational 
lens". To do so will require some basic gravitational lensing theory, which is introduced in the 
next section. 



11 



Chapter 2 

Gravitational Lensing 



It's... bending. And I'm. ..watching it bend. 

- Gerry McCambridge, in Psychokinetic Silverware, Gerry and Banachek 

This chapter presents a brief summary of basic gravitational lensing theory. It is not compre- 
hensive, but covers the main ideas that ar e used in the subsequent papers. For introductory 
purpose s, the review arti c le by Wambsganss (Il998l ^ is recommended. Alternatively, the reference 



work bv lSchneider et al.l (|l992l ) is denser in historical content and technical details. 



2.1 Basic Lensing Theory 

The existence of the phenomenon of gravitational lensing is one of the most important predictions 
of Einstein's theory of general relati vity, implying that the paths of light rays are deflected as the 



light passes near a massive object (ISchneider et al.l . Il992l ). This implies that if a distant point 



source is observed, and a massive object lies between the source and the observer, the apparent 
position of the source is changed. A diagram illustrating this id ea is shown in Figure 12.11 The 
distances Dd, Dqs and Dis are the angular diameter distances (Hogg, 19991 ) from the observer 
to the lens, the observer to the source, and the lens to the source. Note that, in general, 
Dos 7^ ^oi + As on cosmological scales due to general relativistic effects. 

If a light source is positioned at a point = {Xs,Ys) in the source plane and a light ray from 
this source is observed, it will appear to have arrived from a different place, due to the deflection 
of the ray by the gravitational lens. The apparent angular position of the light source in the 
sky is changed, but its new position is described by its apparent coordinates R = {X, Y) in the 
lens plane. Using simple geometry applied to Figure 12. H it can be shown that the relationship 
between R and R^ is given by: 



R.s 



R - A.A(R) 



(2.11 



where Dqs, D^i and Dig are the angular diameter distances from the observer to the source, 
observer to the lens, and lens to source respectively. The deflection angle A(R) is a vector 
function defined over the lens plane, and its form depends on how the mass in the gravitational 
lens is distributed over the source plane. Note that equation 12.11 gives a unique source plane 
position Rs for a given image plane position R. In general, though, a unique inverse function 
does not exist, so any particular position R^ in the source plane can be mapped to multiple 
positions in the lens plane. This is the mathematical reason why the background source is often 
multiply imaged in gravitational lens systems. 
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Source Plane 




Figure 2.1: A standard gravitational lensing situation. The lensing object is assumed to have its 
mass distributed in the lens plane, and the light rays undergo a sharp deflection at this plane. 
This approximation is valid as long as D ni and Dig are much gr eater than the extent of the lens 



mass distribution along the line of sight (jSchneider et al.l . Il992l ) 



It can be shown from the general theory of relativity that the deflection angle Ao(R) for a point 
mass lens at the origin of the lens plane is given by an inverse 1/R law: 



Ao(R) = ^jt:^^^ (2.2) 



where G is Newton's gravitational constant, c is the speed of light in vacuum and M is the mass 
of the point gravitational lens. Interestingly, a similar result can be derived from either special 
relativity or Newtonian mechanics [ assuming a 1/r^ gravit ational field and that light is made 



up of particles that travel at c, see ISchneider et al.l (jl992l )]. although it gives a result that is 



smaller by a factor of 2, in conflict with observations. 

The result for a point mass lens is generalised to arbitrary continuous mass distributions p(R) 
by integrating the deflection angle due to each small mass element of the gravitational lens, 
using Equation 12.21 Green's function: 

4G r R-R' 
_ |R-R' 

For a point mass lens located at R = (0, 0) and a point source at the origin of the source plane, 
[Rs = (0,0)], the observed image would be a circular ring with an angular radius equal to the 
angular Einstein Radius 

AGM Dis 

c2 DolDos ^ • ^ 

This is because any point in the lens plane at a distance DqiOq (an Einstein Radius in the lens 
plane) from the origin is mapped via 12.11 onto the origin of the source plane. The Einstein 
Radius provides a convenient length scale for discussing gravitational lensing. Typically, the 
coordinates R and R^ are replaced by the scaled coordinates: 

(2 5) 
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The scaled deflection angle a is defined as: 



a=/^A (2.7) 

With these changes, the lensing equation 12.11 takes following simple form, referred to as the 
normalised lens equation: 

= r - a(r) (2.8) 

In addition, the surface mass density profile /o(R) is replaced by the dimensionless surface mass 
density a{r) measured in units of jfp^^^, where M is the mass used in the definition of the 
Angular Einstein Radius. This change eliminates the prefactors from the scaled version of 
Equation 12.31 which now takes the simple forrr^l: 



a(r) = / j-—^a{r')d\' (2.9) 



Since the deflection angle field a(r) is built up of f/|r| kernels, it must have zero curl and can 
therefore be written as the gradient of a potential (/)(x, y). In terms of the lensing potential, the 
lens equation is: 

r, = r-V0 (2.10) 



2.1.1 Computation of Extended Images 

Most of the gravitational lensing studies in this thesis are concerned with extended images. In 
these cases, the source is modelled by a non-negative surface brightness function S{xs,ys) over 
the source plane. Surface brightness conservation implies that the observed image due to lensing 
is: 

I{x,y) = S{xs{x,y),ysix,y)) = S{x - a^{x,y),y - ay{x,y)) (2.11) 

If the goal is to predict how such an image would appear on the sky, it would need to be 
convolved by a point spread function (PSF) and then divided into pixels, each pixel taking an 
amount of light equal to the integral of the surface brightness function I{x, y) over the pixel. 
If the source is coincidentally aligned exactly behind a lens galaxy whose projected mass dis- 
tribution is circularly symmetric, the symmetry results in an image of a complete ring, called 
an Einstein Ring. Slight deviations from this exact scenario result in almost circular, almost 
complete rings, as seen in Figure 12.21 More complex lens mass distributions, such as galaxy 
clusters, can produce complex image configurations (Figure [2. 3p . 



2.1.2 Magnification 

Equation ETTT] is a statement of the conservation of surface brightness. If the flux per unit angular 
area on the sky is not changed by an intervening gravitational lens, how can gravitational lensing 
produce magnifications? The answer is that gravitational lenses can produce a change in the 
apparent area of a source on the sky. Consider a very small image located at (x, y) in the image 
plane, having area dx dy. Its position in the source plane is given by equation 12.11 and its area 

^Many authors choose to define the dimensionless density by dividing the actual density by the critical density, 
which is the density where a uniform disc of matter maps all points in the lens plane to the same point in the 
source plane. The critical density is ac-it = tt"^. 
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is dxsdys, where the ratio of areas is given by the Jacobian determinant of the mapping defined 
by equation I2.lt 

1 _ 9ax dax 
dx dy 



dXs 

dx 



dys , ^ 
X — — = det 

dy 



day day 
dx dy 



(2.12) 



An image may be flipped by lensing, resulting in a negative value for the above ratio of differ- 
entials. Also, the magnification is the ratio of the image plane area to the source plane area. 
Hence, the actual (positive) magnification of an image at {x, y) in the image plane is given by: 



det 



1 



dax 
dx 



day_ 
dx 



1 



dax 
dy 

_ doty 

dy 



(2.13) 



If a single source is multiply imaged, the total magnification (that is, observed total flux over 
all images divided by intrinsic flux) is the sum of the fi values at all of the images. 



2.1.3 Applications 

One application of gravitational lenses is to use the fact that the source is magnified to our 
advantage: the gravitational lens acts as a natural telescope. To achieve this, the inverse problem 
must be solved: that is, we must be able to answer the question "what source profiles and lens 
mass profiles could possibly have produced this image?" . One of the primary goals of this thesis 
has been to develop general techniques for an swering this q uestion for any given Einstein Ring 
image. This was first performed by Kocha nek et al.l ( 1983 ) on the radio ring MG1131+0456, 
and in the intervening years much attention has focused on how to imp rove the algorithnis used 



in order to extract as much mformation as possible from the data (e.g. IWallington et al.l . Il994 



199fil : IWarren and Dvel . 1200.4 iBrewer and Lewisl . 1200.4 IWavth and Websteil . l200fil ;i. This is the 



focus of the gravitational lensing related papers in this thesis. 

Considerable research has gone into solving a number of similar problems of gravitational lens 
inversion that focus on the main goal of reconstructing complex lens mass distributions given 
the positions of either the lensed images or statistical information about the shearing of many 
background galaxies (the "weak lensing" regime). These studies often focus on reconstructing 
the mass distribution of clusters of galaxies from their lensing effect (Figure 12. 3|) . 
This lens mass distribution, and particularly the level of su bstructure presen t , are important 
prob es of galaxy formation and the nature of dark matter ( Koopmand . 2005 : Diemand et al 



20071 ). Recently, much of the effort has gone into so-called "nonparametric" modelling, where 



the unknown lens mass profile is described by a large number of unknown parameters such 



(Williams anc 


Saha. 


Ferrer as et al. 


. 2007) 



2000 I1. which has seen most of its use in multiply-imaged QSO systems (e.g. 



20071 ). "Maximum Entropy" reconstruction o f pixellated mass p r ofiles from weak 



lensing dat a has also been proposed and used successfully (iBridle et al.l . Il998l : Marshall et al 
2OOII2OO3I ). It should be noted that, in this cont ext, "Maximu m Entropy" has little relation to 
the principle of maximum entropy as discussed by I.Tavnesl(j200.4 . Rather, it refers to a particular 
choice of prior distribution for image reconstruction, that was o nce thought to have a special 
theoretical status, but this is no longer thought to be the case ( Skilling . 19981 ). Nevertheless, 
reconstructions based on this prior are often impressive. 



An ex ample of a modern Bayesian cluster reconstruction algorithm is described by I Julio et al 
(|2007l ). While parametric models (such as analytical elliptical mass distributions) may be crit- 
icised for assuming too much prior knowledge, pixellated models may be overcorrecting this 
defect by giving the model too much freedom. Thus, intermediate approaches have also been 
used, where the mass map is built up from basis f unctions that ar e broader and smoother than 
pixels ( Marshall et al. . 2003 : Peterson et al. . 2007; Coe et al. . 20081 ) . For computational reasons. 
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Figure 2.2: Several real images of Einstein Rings (the blue structures) observed with the Ad- 
vanced Camera for Surveys (ACS) aboard the Hubble Space Telescope. Developing methods for 
discovering the unlensed source profile is the primary topic of this thesis. The lensing galaxies 
are also clearly seen in these images. As these are mostly elliptical galaxies they have a much 
redder appearance. 



most of these methods do not try to model the entire image and all of the unknown properties 
of each extended source. Typically, they reduce the data to a set of image positions (and other 
quantities), and try to find a lens mass distribution that can reproduce the reduced data set. 
While we note the existence and success of these methods, they should be considered distinct 
from the lens reconstruction tasks discussed in this thesis, which focus more on inferring the 
morphology of the background source. The primary application is galaxy-galaxy lenses (such as 
those displayed in Figure 12. 2p , where it is often assumed that the lens is simple enough to be 
described by a parametric model. 
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Figure 2.3: Strong gravitational lensing by the galaxy cluster 0024+1654, image d by the Wide 



Fidd Planetary Camera 2 aboard the Hubble Space Tdescope kneib et all B. The multiple 



images of background galaxies can be found amongst the cluster members; they can be identified 
by their higher redshift or from a suspiciously lensed shape. This system has been studied 
thoroughly, with many paper s emp l oying different techniques to recover t he ni ass profile of the 
cluster (e.g. Wallington et al. . 19951 : Tvson et al. . 1998 : Shapiro and Ihev . 200d ). and the source 



profile (jCollev et al.l . Il99()l ). 
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Chapter 3 



Strong Gravitational Lens Inversion: 
A Bayesian Approach 

B. J. Brewer and G. F. Lewis, 2006, Astrophysical Journal 637, 608-619. 

A mathematician may say anything he pleases, but a physicist must be at least 
partially sane. 

- J. Willard Gibbs 

The first paper in this thesis, Strong Gravitational Lens Inversion: A Bayesian Approach, in- 
troduces an approach based on MCMC for reconstructing extended sources in gravitational lens 
systems. Previous methods in the literature have usually been based on pixellating the source 
plane and finding the values of those pixels that minimise the difference (in a least squares sense ) 
between the predicted image and that that has been actually observed ( Warren and Dve . 20031 ). 



Since the prediction of an image from a model involves a blurring step, information is lost and 
the solution to the inverse problem is not unique. This can be solved by using "regularisation" , 
where the meri t function for minimisatioii is modified so as to penalise certain kinds of implau- 
sible solutions ( Wavth and Webster . 20061 ) . A simpler approach th at is effective if the im age is 



not very high resolution is to model the source parametrically (e.g. [Marshall et al.l . 120071 ). 
In this paper, previous methods based on "regularisation" are reinterpreted in a Bayesian con- 
text, allowing us to show that the prior information that is implicitly being assumed is not really 
all that appropriate for astronomical sources - because they did not allow for the fact that we 
expect many of the source pixels to be dark. The reason for this is that regularisation consists 
of adding an extra term to the merit function that is to be optimised, and this results in a 
merit function that looks like the logarithm of the posterior probability density. However, if the 
regularisation term is interpreted as the logarithm of a prior probability density over the space 
of pixellated sources, it usually has undesirable features; for example, the linear regularisation 
commonly used does not specify that negative values for the pixels are not allowed, or even that 
positive values are more probable than negative ones. In this paper, a basic prior distribution 
over pixellated sources that takes sky darkness into account is described and applied to simulated 
data. It should be noted that this paper addresses only a subset of the kinds of problems that 
might be called "strong gravitational lens inversion" - it does not address the reconstruction of 
complex mass distributions. 

All of the code for the numerical work in this paper, and the manuscript itself, was written by 
myself (BJB), in consultation with my supervisor Geraint Lewis. 
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Chapter 4 



The Einstein Ring 0047-2808 
Revisited: A Bayesian Inversion 

B. J. Brewer and G. F. Lewis, 2006, Astrophysical Journal 651, 8-13. 

How long did it take them to build the telescope? 
- David J. C. MacKay 

This second paper is an application of ideas presented in the first paper (Chapter 3) to real 
data. The Hubble Space Telescope observations of ER 0047-2808 had already been extensively 
analysed beforehand, so the results could be directly compared to those obtained by other 
means. The choice of the prior distribution over pixellated sources was also modified slightly 
to improve the efficiency of the MCMC exploration. The resulting source reconstruction had a 
significant improvement in resolution (by about 50 per cent) compared to previous studies, and 
the lens model parameters were more uncertain because some assumptions could be relaxed and 
the number of unknown source pixels was increased. Particularly, the central position of the 
lens model was allowed to be a free parameter - although it was found to be consistent with the 
central position of the lens galaxy's light profile after all. This paper provides a demonstration of 
the fact that careful consideration of the available prior information can often lead to improved 
results in data modelling. 

Once again, all of the code for the numerical work in this paper, and the manuscript itself, was 
written by me (BJB), in consultation with my supervisor Geraint Lewis. 
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Chapter 5 



Unlensing HST Observations of the 
Einstein Ring IRXS J1131-1231: A 
Bayesian Analysis 

B. J. Brewer, G. F. Lewis, Monthly Notices of the Royal Astronomical Society, in press 

Hofstadter's Law: It always takes longer than you expect, even when you take into 
account Hofstadter's Law. 

- Douglas Hofstadter 

A second optical Einstein Ring observed with the HST is the topic of this third paper. This 
system posed a number of technical challenges to the technique that was used on 0047-2808. 
The relatively large number of pixels in the image (121x121), complex source structure and 
presence of light from the lensing galaxy and the AGN of the source galaxy were all issues 
that made the inference more difficult than expected. The one source reconstruction that had 
been done in the literature before this paper appeared used an unusual reconstruction technique 
(jClaeskens et al.l . l200fil V This was done by inferring the lens model parameters from the QSO 



image positions only, and using this result to back ray-trace the observed image to the source 
plane. When different parts of the image disagreed about the value of a source pixel, the median 
value was used. 

The claim made bvl Claeskens et al.l t00(h that the QSO images alone were stronger constraints 



on the lens model than the extended images seemed unlikely given the number and detail of the 
extended structures; hence, a significant fraction of this paper is devoted to exploring this issue. 
The conclusion of this comparison was that the QSO images are only strong constraints on the 
lens parameters if the central position of the lens mass distribution is fixed in advance. Other- 
wise, the extended images become far more informative. In general, if the extended source is 
simple (for example, one diffuse component), then extended images do not confer any additional 
information about the lens than point images. However, if the extended source contains a lot of 
substructure (as is the case with J1131), the extended images provide vital extra information. 
All of the code for the numerical work in this paper, and the manuscript itself, was written by 
myself (BJB), in consultation with my supervisor Geraint Lewis. 
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Chapter 6 



A Molecular Einstein Ring at 
z =4.12: Imaging the Quasar Host 
Galaxy of PSS2322+1944 Through a 
Cosmic Lens 

D. A Riechers, F. Walter, B. J. Brewer, C. L. Carilli, G. F. Lewis, F. Bertoldi, P. Cox, 
Astrophysical Journal, in press 

OK, one last time. These are small. But the ones out there are far away. Small... 
far away... ah forget it! 

- Father Ted 



The fourth paper in this thesis is a study of a radio Einstein Ring ( Carihi et al. . 20031 ) at redshift 



4.12, seen in the wavelength of the Carbon Monoxide 2-1 transition. The data consisted of seven 
images at slightly different frequencies, and the image of the ring changes with frequency due to 
velocity structure of the emitting CO gas cloud. Thus, there is not one single unknown source, 
but seven, all lensed by a common mass profile. CO is of particular interest because it is able 
to exist in the same conditions (temperature, etc) as molecular hydrogen - the raw fuel for star 
formation and the powering of active galactic nuclei. 

A modelling challenge encountered in this study was the fact that the noise in the images is 
correlated on length scales of several pixels. Correlated noise models can be computationally 
expensive, so the following simplified approach was used: simply increase the size of the "error 
bars" on the data so that only a fraction of the information in the data is being taken into 
account; where the correct fraction is determined by the effective number of pixels with inde- 
pendent noise values. This approach produced satisfactory results because the large PSF meant 
that the modelling could not overfit structures smaller than the noise correlation scale. 
The results revealed for the first time a detailed multi-wavelength reconstruction of the extended 
molecular source, which is a thin disk-like structure with a velocity gradient: i.e. one end of 
the source is redshifted relative to the other end, with an intermediate velocity for the central 
parts. The optical quasar source does not reside at the centre of this disk, so the CO gas is not 
orbiting the central black hole of the galaxy in the simplest possible way. This indicates that the 
structure of the source galaxy is complex, as expected for high redshift star forming galaxies, in 
standard galaxy formation scenarios. 

The observations and data reduction in this paper were undertaken by our collaborators, and I 
(BJB) wrote and used the lens modelling code, producing Figures 4, 5 and 7 (and the results 
shown in Figure 6), with input from my supervisor, Geraint Lewis. I also wrote the parts of 
the text describing the lens inversion technique (section 4) and am responsible for the resultant 
interpretation of the reconstructed source. 
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Chapter 7 



Bayesian Inference from 
Observations of Solar-Like 
Oscillations 

B. J. Brewer, T. R. Bedding, H. Kjeldsen, D. Stello, Astrophysical Journal 654, 551-557. 

On the subject of stars, all investigations which are not ultimately reducible to 
simple visual observations are. ..necessarily denied to us. ..We shall never be able by 
any means to study their chemical composition. 

- August Comte, French philosopher (1835) 



The following paper is one of two in this thesis that is not concerned with the topic of grav- 
itational lens inversion. Asteroseismology is the study of the intrinsic oscillations of stars. In 
recent years, there has been an explosion in the study of solar-like oscillations in main sequence 
stars - this is where the star oscillates in many modes simultaneously and with small ampli- 
tude. For exaniple, the sun oscillates in many frequencies with a typical period of ~ 5 minutes 



(jCunha et al.l . 120071 ). Knowing the frequencies of the oscillation modes of a star is a powerful 



constraint on the internal properties of the star. 

This process is analogous to hearing the sound of an unknown musical instrument and using 
the propert ies of that sound to infer something about the physical structure of the instrument 
While the sound doesn't provide information about absolutely every aspect of the 
instrument, it does provide some information. Similarly, stellar oscillations can tell us something, 
but not everything, about the internal structure of the star. Hence, it is also an underconstrained 
inverse problem. However, this paper does not deal with the problem of inferring stellar structure 
from frequencies, it is instead concerned with inference of the frequencies from observational data 
of the variations of the star's brightness or surface velocity with time - basically, fitting multiple 
sine waves to noisy data. This is essentially the classic problem of data analysis: given some 
noisy data, find out whether there is a signal present, and if so, what its properties are. 
While conventional Fourier techniques are usually adequate (taking the Fourier transform of 
something close to sinusoidal yields peaks at the frequencies of the oscillations), they can only be 
deriv ed from Bayes' theorem under certain assumptions that do not always apply (jBretthorstl . 
19881 ). and therefore may not be using all of the information in the data. The approach we 



prese nted in this pap e r is inore ge i ierally applicable, an d has been applied to observations of three 
stars ( Bedding et al. . 2006I . 2007 : Carrier et al. . 2007). These papers have not been included in 
this thesis due to space constraints; in any case, my contribution tended to be small and to 
confirm the results of the conventional analysis. Recently, the method presented in this paper 
has been criticised on the grounds that we did not take damping and excitation into account in 
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our modelling (jAppourchauxl . 120071 ) . However, previo us approaches bas ed on the power spectrum 
are also implicitly modelling the signal as sinusoids ( Bretthorst . 19881 ). and are also vulnerable 
to similar criticisms. 

The ideas and numerical work presented in this paper were done by myself (BJB), with Tim 
Bedding and Hans Kjeldsen providing guidance on asteroseismology and particularly on the 
kinds of assumptions that such a program should and should not make. Dennis Stello produced 
the simulations of stochastically excited and damped oscillations that were used in the paper. 



23 



Chapter 8 



Implications of the Early Formation 
of Life on Earth 

B. J. Brewer, submitted to Astrobiology. 

It is as true in probability theory as in carpentry that introduction of more powerful 
tools brings with it the obligation to exercise a higher level of understanding and 
judgment in using them. If you gave a carpenter a fancy new power tool, he may 
use it to turn out more precise work in greater quantity; or he may just cut off his 
thumb with it. It depends on the carpenter. 

- Edwin T. Jaynes 

This paper stemmed from a discussion with Dr Charlie Lineweaver that took place at the Harley 
Wood Winter School held in June 2004 at Coolangatta, Queensland. Dr Lineweaver had given 
a talk on the topic of his work in computing the "Galactic Habitable Zone", the times and 
places in the galaxy where condition s such as metallicities and low supernova frequencies are 



ideal for life (jLineweaver et al.l . l2004l ). During the talk, it was suggested that we should expect 



life (at least basic forms) to be common in the galax y, because it formed on Earth a surprisingly 



short time after the surface had cooled sufficiently (jLineweaver and Davisl . l2002l ) . Whilst this 
certainly constituted some evidence in that direction, the confidence with which Dr Lineweaver 
expressed his conclusion seemed unjustified. After the talk, I mentioned this to Dr Lineweaver; 
however, my thoughts were not sufficiently well-formed to make a convincing case at the time. 
This paper presents a more well-developed case, showing that because our own existence is more 
probable if life forms early, the fact that life did form early is not conclusive - although it does 
constitute some useful evidence in this difficult and often speculative field. In this paper, I 
revisit the Lineweaver and Davis model and find that their overconfident conclusion resulted 
from their unintentional use of a very informative prior distribution. 
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Chapter 9 

Conclusions and Further Work 



This thesis has presented an array of apphcations of Bayesian statistics to several complex 
data analysis problems in astrophysics. While data analysis methods exist for these problems, 
in the past these have tended to be based on simple ideas with limited applicability (such as 
least-squares fitting). The first problem considered was that of inferring the unknown surface 
brightness profile of a gravitationally lensed galaxy, where the image has also been convolved 
with a point-spread function and subject to additive noise. Compared to methods based on 
least squares fitting (with or without regularisation) , our method consistently produces sharper 
(higher resolution) reconstructions of the source; this has been demonstrated strongly in the case 
of ER 0047-2808. Application of our method to RXS J1131-1231 also produced some different 
conclusions to previous work, especially regarding the constraining power of the extended images 
compared to pointlike images. In this system, the large amount of substructure in the extended 
source is extremely valuable for constraining the lens mass distribution, besides being of interest 
in its own right. For the radio (carbon monoxide) Einstein Ring PSS2322+1944 at redshift of 
4.12, the spectacular multi- wavelength source re construction pr e sented in this thesis is the first 



inversion using the recent new observations by iRiechers et al.l (j2007l ) . The lens inversion for 



this system is a necessary step towards studying the complex dynamics of the molecular gas in 
the source galaxy . The number of high quality images of Einstein Rings is increasing rapidly 
(e.g. Bolton et al. . 20081 ) and the approach demonstrated in this thesis should prove valuable for 
studying them. 

The primary reason for the success of this new method is that the number of source pixels to 
be inferred is large compared to the number of pixels in the observed image. In this regime, 
the prior information that the method is implicitly assuming becomes important, and ordinary 
least-squares is implicitly assuming (in a sense) no prior information, not even positivity. Some 
regularisation formulas, which can be viewed as priors, amount to strange assumptions of prior 
information (for example, many linear regularisers/Gaussian priors assume each pixel is inde- 
pendent and is just as likely to be positive as it is to be negative). In contrast, this work has 
placed a greater emphasis on the realism of the prior distribution, rather than its analytical 
properties. Whilst this comes at the price of making the resulting models more complicated, 
computing power and clever numerical techniques (in this case, MCMC) have stepped in to 
make the computations feasible after all. 

Of course, all of this source reconstruction work relies on the assumption that simple analytical 
lens models are applicable. For isolated galaxy lenses, this is probably not a problem, but it 
would be highly inappropriate to apply this method as it stands to source reconstructions of 
galaxies lensed by galaxy clusters. Substructure in galaxy cluster lenses is a major topic of 
interest, and in principle it should be possible to detect substructure with an approach like 
the one used here. The only reason it cannot be done presently is the ini mense computing 
power required (other methods such as PixeLens ( Williams and Saha . 2000l ) and LensPerfect 
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( Coe et al. . 20081 ) are feasible because they condition on a reduced data set, such as several 
image positions, rather than conditioning on the value of every pixel in a large image). A 
multitude of new Bayesian computa tion techniques are being developed, such as likelihood-free 
computation (e.g. ISisson et all . 120071 ) that may be able to contribute to progress in this field. 
A similar story holds for the astroseismology parts of this thesis: when commonplace data 
analysis methods are interpreted in Bayesian terms, the conditions that must hold for them to 
be applicable become more apparent. For example, it has always been recognised that Fourier 
analysis of a (non-equally-spaced) time series containing a sinusoidal signal may be difficult due 
to the phenomenon of aliasing. The method presented in this thesis can effectively accomplish 
automatic de-aliasing and tell us the uncertainty of any conclusion, allowing for the recovery of 
modes that may have been discarded as noise by a Fourier or CL EAN analysis. Howev er, this 
method is also explicitly using an assumption of sinusoidal signals ([Appourchauxl . 120071 ) . Work 
has commenced on generalising this approach to take into account the "semi-regular" behaviour 
of solar- like oscillation signals (Brewer and Stello 2008, in preparation). 

The algorithms used in this thesis tend to be computationally expensive if the number of 
pixels/light-atoms/modes is large. There are many more advanced MCMC techniques that 
could be implemented to improve the efficiency of the sampling and thereby extend the range 
of systems that could be treated using this appr o ach. Two techniques that may be helpful 
in this regard are Hamiltoni an Monte Ca rlo ( Neal . 19931 ) and more advanced implementations 
of Reversible Jump MCMC (Green, 19951 ). including split/merge operations. For gravitational 
lens inversion, it should be possible to extend the algorithms in order to model the lens in a 
"nonparametric" way, similar to that employed for the source light profile. This would allow for 
the detection of substructure without making strong prior assumptions. 

Finally, the Bayesian formalism was applied to the study of the early formation of life on earth, 
in order to quantify the significance of this information for the question of whether life is common 
in the universe. This study highlighted an unintentional assumption in a previous analysis that 
caused the authors to reach overconfident conclusions. The work presented in this thesis corrects 
this mistake and shows that the current data do not decisively rule out the possibility that we 
are alone, although they do disfavour this possibility. 

In the limit when all of our data become very good, all of these efforts will become irrelevant: 
but the most active parts of science are always those where the questions we are asking are 
not clearly and unambiguously determined by our current data. For this reason, the question 
of how to analyse noisy and incomplete data, to combine information from different sources, 
and to honestly express the implications of that information, will always be with us. As more 
and more advanced instruments are made, and observations planned, more sophisticated data 
analysis tools and a deeper understanding of their rationale will be required. The recent rise 
in Bayesian activity amongst astronomers suggests that these challenges are widely recognised. 
Consequently, Bayesian Inference continues to become an important and widely-used tool of 
modern astronomical research. 
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